docs(simd): dispatch architecture + parity matrix + tech debt + integration plan#171
Conversation
…ration plan
New design doc at .claude/knowledge/simd-dispatch-architecture.md
covering four artifacts in one place:
1. Dispatch architecture — three explicit cargo configs (v3 default,
v4 explicit, native explicit) + optional runtime LazyLock path.
Each is a conscious cargo invocation; no silent fallback.
2. Parity matrix — typed lane primitives × backend (avx512 / avx2 /
neon / nightly / scalar). Surfaces the AVX2 gap: only F32x16,
F64x8, U8x32, F16Scaler exist; the other 14 cross-arch lanes are
missing → CI SIGILL on the v3 baseline.
3. Technical debt matrix — 10 issues (TD-SIMD-1..10) ranked P0..P3.
P0: default config bakes AVX-512 → CI SIGILL on AVX2-only runners.
P0: AVX2 missing 10 two-half wrappers (U64x8, I32x16, …).
P1: simd.rs has no `feature = "nightly-simd"` arm → simd_nightly/*
unreachable despite being the most-complete backend.
P1: NEON parity gap symmetric to AVX2.
…through P3 CI-matrix entries.
4. Integration plan — six sequenced phases, each a single-PR worker:
Phase 1 unblock CI (config flip + AVX2 wrappers + dispatch arms)
Phase 2 unblock nightly-simd polyfill
Phase 3 NEON parity
Phase 4 scalar→file + macro + F16 honesty
Phase 5 runtime dispatch (opt-in)
Phase 6 AVX-512 explicit CI job
Pinned by the discussion thread on PR #170 (CI run 26151746204): the
38 simd_avx2/simd_amx/simd_ops/simd_soa test failures at uniform 19 s
timeouts are the v3-runner / v4-baked-binary SIGILL pattern. This
doc captures the architecture target, the gaps, and the path.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c1942ba00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| #[cfg(all(feature = "nightly-simd", any(target_arch = "x86_64", target_arch = "aarch64")))] | ||
| pub use crate::simd_nightly::{F32x16, F64x8, U8x32, U8x64, U16x32, U32x16, U64x8, I8x32, I8x64, I16x16, I16x32, I32x16, I64x8, F32Mask16, F64Mask8, BF16x16, BF16x8}; |
There was a problem hiding this comment.
Keep nightly-simd dispatch valid on non-x86/aarch64
The proposed nightly-simd arm is constrained to x86_64/aarch64, but the same section describes this mode as usable on “or any” target. With the shown scalar fallback condition (not(..., feature = "nightly-simd")), enabling nightly-simd on targets like wasm32 or riscv would leave no matching re-export path in simd.rs, causing unresolved SIMD type exports. Please either broaden the nightly arm to all targets that should be supported or adjust the scalar fallback predicate so one backend always matches.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9016621. Dropped the target_arch = x86_64 | aarch64 constraint from the nightly arm — core::simd is portable, so the arm is now #[cfg(feature = "nightly-simd")] unconditional and catches wasm32 / riscv too. Also tightened the scalar fallback predicate to the exact negation of arms 1-4 (not(any(nightly-simd, all(x86_64, avx2), aarch64))) so x86_64-without-AVX2 also lands on scalar. Result: exactly one arm matches on every (target, feature) pair.
Generated by Claude Code
| | `I16x32` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ | | ||
| | `I32x16` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ | | ||
| | `I64x8` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ | | ||
| | `BF16x8` | ✅ `__m128bh` | ❌ | ❌ | ❌ | ✅ | |
There was a problem hiding this comment.
Correct BF16x8 parity status for simd_nightly
The parity matrix marks BF16x8 as missing in simd_nightly, but src/simd_nightly/mod.rs already re-exports BF16x8 (alongside BF16x16). This mismatch can mis-prioritize follow-up work by treating an implemented lane as a gap, so the matrix row should be updated to reflect current backend coverage.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9016621. BF16x8 row column 4 (simd_nightly) flipped from ❌ to 🔵 — confirmed by grep BF16x8 src/simd_nightly/mod.rs showing pub use bf16_types::{BF16x16, BF16x8};. Matrix now matches actual backend coverage.
Generated by Claude Code
- Drop the `target_arch = x86_64 | aarch64` constraint from the nightly-simd arm. `core::simd` is portable, so this arm must match on wasm32 / riscv too — otherwise enabling `nightly-simd` on those targets leaves no backend (every other arm excludes it). - Tighten the scalar fallback predicate to the exact negation of arms 1-4 so x86_64-without-AVX2 also routes to scalar. - Correct `BF16x8` parity row: `simd_nightly` already re-exports `BF16x8` (alongside `BF16x16`); the matrix was stale.
Summary
Docs-only PR. New design doc
.claude/knowledge/simd-dispatch-architecture.mdcapturing four artifacts in one place per the discussion thread on PR #170:v3default,v4explicit AVX-512,nativeexplicit) + optional runtimeLazyLock<CpuCaps>mode. Each is a conscious cargo invocation; no silent fallback.simd_avx512/simd_avx2/simd_neon/simd_nightly/scalar). Surfaces the AVX2 gap: onlyF32x16,F64x8,U8x32,F16Scalerexist; the other 14 cross-arch lanes are missing.TD-SIMD-1..TD-SIMD-10) ranked P0..P3. P0s are the v4-baked-binary default + the AVX2 wrapper gap. P1s are the unwirednightly-simddispatch arm + NEON parity. P2-P3s are scalar/macro/runtime/CI ergonomics.1 file, +294 / 0 vs master.
Why now
PR #170 (
tests/1.95.0CI run 26151746204/76920666348) showed 38 tests failing uniformly at ~19 s timeouts insimd_avx2::*/simd_amx::*/simd_ops::*/simd_soa::*— the SIGILL pattern from a v4-baked binary running on an AVX2-only GitHub runner. The matrix + tech debt capture the gaps; the integration plan sequences the fix.Out of scope
No code changes in this PR. Phase 1 (the actual config flip + AVX2 wrappers + dispatch arm) is a separate follow-up PR.
🤖 Generated with Claude Code
Generated by Claude Code